Polymorphic Data
Overview
Polymorphism in TimeBase is the ability of a single stream to contain multiple message types. See Basic Concepts and Messages to learn more.
In the following example we query tickquerydemo
stream for all records. As the result, we get messages of different types: TradeMessage and BestBidOfferMessage.
#Shell format
==> SELECT * FROM tickquerydemo
>deltix.timebase.api.messages.TradeMessage,TIMESTAMP,SYMBOL,TYPE,price,size
0,2011-10-17 17:21:41,XBANK,EQUITY,301.25,800.0
>deltix.timebase.api.messages.BestBidOfferMessage,TIMESTAMP,SYMBOL,TYPE,offerPrice,offerSize,bidPrice,bidSize
1,2011-10-17 17:21:42,XBANK,EQUITY,301.5,60000.0,298.5,800.0
2,2011-10-17 17:21:43,GREATCO,EQUITY,45.0,100.0,43.0,400.0
3,2011-10-17 17:21:43,XBANK,EQUITY,299.5,40000.0,295.0,300.0
Create Polymorphic Output
RECORD
You can use the RECORD ... TYPE ... WHEN
construction to return a polymorphic data set, where TYPE
sets a new type, and WHEN
sets a condition.
-- This query returns a polymorphic dataset with fields of both classes TradeMessage and BestBidOfferMessage
WITH
entry AS L1Entry AS l1
SELECT
RECORD
entry.price FIELD "price",
entry.size FIELD "size"
TYPE "TradeMessage"
WHEN entry IS KrakenTradeEntry
RECORD
l1[side == ASK].price FIELD "offerPrice",
l1[side == ASK].size FIELD "offerSize",
l1[side == BID].price FIELD "bidPrice",
l1[side == BID].size FIELD "bidSize"
TYPE "BestBidOfferMessage"
WHEN entry IS L1Entry
FROM kraken
ARRAY JOIN entries AS entry
UNION
You can also produce a polymorphic data set using the UNION construction.
info
Refer to UNION to learn more about this feature.
-- UNION of these two queries will return a polymorphic dataset with fields of both classes TradeMessage and BestBidOfferMessage
SELECT
trade.price AS "price",
trade.size AS "size"
TYPE "TradeMessage"
FROM kraken
ARRAY JOIN entries[THIS IS KrakenTradeEntry] AS trade
UNION
SELECT
bbo[side == ASK].price AS "offerPrice",
bbo[side == ASK].size AS "offerSize",
bbo[side == BID].price AS "bidPrice",
bbo[side == BID].size AS "bidSize"
TYPE "BestBidOfferMessage"
FROM kraken
ARRAY JOIN (entries AS array(L1entry))[THIS IS NOT NULL] AS bbo
Polymorphic Union Model and GROUP BY SYMBOL
Selecting *
means simply selecting the current message - alone. Selecting any other combination of fields implies the construction of a new message type.
For example, we can try and select the combination of price
, bidPrice
, offerPrice
. Even though these fields are picked from two different message types, this is a legal request. Its meaning is "give me the last price and quote values as of the time of each event". The result of such a query is not very useful in our scenario, since it mixes the data of different symbols:
#Shell format
SELECT price, bidPrice, offerPrice FROM tickquerydemo
>_,TIMESTAMP,SYMBOL,TYPE,price,bidPrice,offerPrice
0,2011-10-17 17:21:40,GREATCO,EQUITY,_,42.5,43.5
1,2011-10-17 17:21:40,XBANK,EQUITY,_,301.25,301.75
2,2011-10-17 17:21:41,XBANK,EQUITY,301.25,_,_
3,2011-10-17 17:21:42,XBANK,EQUITY,_,298.5,301.5
4,2011-10-17 17:21:43,GREATCO,EQUITY,_,43.0,45.0
5,2011-10-17 17:21:43,XBANK,EQUITY,_,295.0,299.5
6,2011-10-17 17:21:44,GREATCO,EQUITY,44.0,_,_
!end
While executing this query, the QQL engine creates an object called Query State, containing, in our example, three fields: price
, bidPrice
, offerPrice
Initially these fields are set to NULL
. As each message comes into the query engine, the corresponding fields are updated. As we know, the first underlying message is: [from the select * query]
. This first message causes the first update to the Query State, which we see in the output:
#Shell output format
>_,TIMESTAMP,SYMBOL,TYPE,price,bidPrice,offerPrice
0,2011-10-17 17:21:40,GREATCO,EQUITY,_,42.5,43.5
As you see, price is still NULL, because we have not seen a TradeMessage
yet. The next message in the source data set is another BestBidOfferMessage
, this time for XBANK. This new message again updates the bidPrice and offerPrice fields of the Query State, while price is still NULL. The third message (a TradeMessage
) finally updates the price field, and so on. In QQL terminology, the Query State object represents a Polymorphic Union Model (PUM) of the source data.
The following diagram illustrates the flow of data in the query we just ran:
As we mentioned, such a query is useless on market data, as it mixes price from one symbol with quote values from another, as seen in messages #4 and #6:
#Shell format
4,2011-10-17 17:21:43,GREATCO,EQUITY,301.25,43.0,45.0
6,2011-10-17 17:21:44,GREATCO,EQUITY,44.0,295.0,299.5
What we want instead is to track the Query State independently for each symbol. Fortunately, there is a very simple way to make the QQL engine behave the way we want, by specifying the GROUP BY SYMBOL clause. While in relational SQL the GROUP BY
clause is only valid when aggregate values are being selected (such as MAX()
, COUNT()
, etc.), in QQL this clause has the more general effect of determining how many Query States are created, and how they are indexed.
#Shell format
SELECT price, bidPrice, offerPrice FROM tickquerydemo GROUP BY SYMBOL
>_,TIMESTAMP,SYMBOL,TYPE,price,bidPrice,offerPrice
0,2011-10-17 19:21:44,GREATCO,EQUITY,44.0,_,_
1,2011-10-17 19:21:43,XBANK,EQUITY,_,295.0,299.5
!end
Without the GROUP BY SYMBOL
clause, there is only one Query State object for all symbols. When the GROUP BY SYMBOL
clause is present, there is a separate Query State object message created for each distinct symbol. Each new source message updates the fields of the corresponding Query State:
Adding GROUP BY SYMBOL to our polymorphic query instantly makes the result look the way we expect.